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DERIVATIVES OF HUMAN BILE-SALT STIMULATED LIPASE, AND PHARMACEUTI- 
CAL COMPOSITIONS CONTAINING THEM 

Field of the invention 

The present invention relates to new DNA sequences, to new 
5 proteins coded for by such DNA sequences, and to the use 
as further described below of such proteins . The invention 
also encompasses vectors, such as piasmid constructs, 
comprising such DNA sequences, being capable of expressing 
the desired enzyme* The invention also includes host 

10 organisms transfected with such constructs e.g. bacteria 
yeast, mammalian cells, and transgenic animals. The 
invention also includes processes for the preparation of 
the novel products of the invention. The new proteins of 
the invention are related to an enzyme, known i.a. as 

15 human bile salt-stimulated lipase. 
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Background to the invention 

The human lactating mammary gland synthesizes and secretes 
5 with the milk a bile salt-stimulated lipase (BSSL) [1] 
that, after specific activation by primary bile salts [2, 
57, 58], contributes to the breast-fed infant's endogenous 
capacity of intestinal fat digestion [3-5]. This enzyme, 
which accounts for approximately 1% of total milk protein 

10 [6], is a non specific lipase; in vitro it hydrolyses not 
only tri-, di- and monoacy lglycerols , but also 
cholesteryl-, and retinyl esters, and 
lysophosphatidylglyeerols [7-10]. Furthermore, its 
activity is not restricted to emulsified substrates, but 

15 micellar and soluble substrates are hydrolyzed at similar 
rates [11]. 

BSSL is not degraded daring passage with the milk through 
the stomach, and in duodenal contents it is protected by 
bile salts from inactivation by pancreatic proteases such 

20 as trypsin and chymotrypsin [2,11]. It is, however, 

inactivated when the milk is pasteurized, e.g. heated to 
62.5 °C, 30 min [12]. Model experiments in vitro suggest 
that the end products of triacylglycerol digestion are 
different in the presence of BSSL [5,7]. Due to lower 

25 intraluminal bile salt concentrations during the neonatal 
period [13,14] this ntay be beneficial to product 
absorption [5,15]. 

The carboxylic ester hydrolase (CEH) of human pancreatic 
juice [16] seems functionally to be identical, or at least 

30 very similar, to BSSL [5]. They also share common epitopes 
[8,17], have identical N-terminal amino acid sequences 
[17] and are inhibited by inhibitors of serine esterases, 
e.g. eserine and diisopropylf luorophopsphate [6,8,16]. It 
has been hypothesized that the two enzymes are products of 

35 the same gene [18,19]. The observed molocular size 

difference [8,19] could be explained by different patterns 
of glycosylation, as recently suggested [17]. 
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Dietary lipids is an important source of energy. The 
energy-rich triacylglycerols constitute more than 95% of 
these lipids. Some of the lipids, e.g. certain fatty acids 
and the *at soluble vitamins, are essential dieatary 
5 constituents. Before gastro-intestinal absorption the 
triacylglycerols as well as the minor components, i.e. 
esterified fat-soluble vitamins and cholesterol, and 
diaeylohosphatidylglycerols, require hydrolysis of the 
ester bonds to give rise to less hydrophobic, absorbable 
10 products. These reactions are catalyzed by a specific 
group of enzymes called lipases. 

in the human adult the essential lipases involved are 
considered to be gastric lipase, pancreatic colipase- 
dependent lipase (tri- and diacylglycerol hydrolysis), 

15 pancreatic phospholipase A2 (diacylphosphatidylglycerols) 
and carboxylic ester hydrolase ( cholesteryl- and fat 
soluble vitamin esters). In the breast-fed newborn bile 
saH-stimulated lipase plays an essential part in the 
hydrolysis of several of the above mentioned lipids. 

20 Tooether with bile salts the products of lipid digestion 
form mixed micelles from which absorption occurrs (3-5). 

Common causes of lipid malabsorption, and hence 
malnutrition, are reduced intraluminal leavels of 
25 pancreatic colipe ^-dependent lipase an/or bile salts. 
Typical examples of such lipase deficiency are patients 
suffering from cystic fibrosis, a common genetic disorder 
resulting in a life-long deficiency in som 80% of the 
patients, and chronic pancreatitis, often due to chronic 
3 0 alcoholism. 

The oancreatic and liver functions are not fully developed 
at birth, most notably in infants born before term. Fat 
malabsorption, for physiological reasons, is a common 
35 finding and thought to result from low intraluminal 
pancreatic colipase-dependent lipase and bile salt 
concentration. (3,4,13-15). However, because of BSSL, such 
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malabsorption is much less frequent in breast-fed infants 
than in infants fed pasteurized human milk of infant 
formulas (3-5, 12, 59, 60, 61). This is one reason why it 
has been advocated that newborn infants, particularly 
5 preterm infants, that cannot be fed their own mothers milk 
should be fed non-pasteurized milk from other mothers 
(12) . 

The present treatment of patients suffering f rom a 
10 deficiency of pancreatic lipase is the oral administration 
of very large doses of a crude preparation of porcine 
pancreatic enzymes, Colipase-dependent pancreatic lipase 
is inactivated by low pH. Such conditions are prevalent in 
the stomach, with the result that orally administered 
15 pancreatic lipase is virtually completely inactivated on 
the passage through the stomach to the gut. Therefore, 
this effect cannot be completely overcome by the use of 
large doses of enzyme. The large doses administered are 
inadequate for most patients, and the preparations are 
20 impure and unpalatable. Certain tablets have been 

formulated which pass through the acid regions of the 
stomach and discharge the enzyme only in the relatively 
alkaline environment of the jejunum. However, many 
patients suffering from pancreatic disorders have an 
2 5 abnormally acid jejunum and such tablets may fail to 
discharge the enzyme and may therefore be ineffective. 
Moreover, since the preparations presently on the market 
are of a non-human source there xs % risk of 
immunoreactions that may cause harmful effects to the 
30 patients or result in reduced therapy efficiency. 

A further drawback with the present preparations is that 
their content of other lipolytic activities than colipase- 
dependent lipase are not stated. In fact, most of them 
contain very low levels of CEH/BSSL-activity . This may be 
35 one reason why many patients, suffering from cystic 

fibrosis in spite of supplementation therapy, suffer from 
deficiencies of fat soluble vitamins and essential fatty 
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acids . 

Thus, there is a great need for products with properties 
and structure derived from human lipases and with a broad 
substrate specificity, which products may be orally 
5 administered to patients suffering from deficiency of one 
or several of the pancreatic lipolytic enzymes. Hie 
products of the present invention fulfil this need by 
themselves or in combination with other lipases or in 
combination with preparations containing other lipases. 
10 Furthermore , for some human infants there is an obvious 
need to improve fat utilization from conventional infant 
formulas, or pasteurized human milk from so-called milk 
banks . 

BSSL has several unique properties that makes it ideally 
15 suited for substitution and supplementation therapy: 

It has been designed by nature for oral administration. 
Thus, it resists passage through the stomach and is 
activated in contents of the small intestine. 
Its specific activation mechanism should prevent 
20 hazardous lipolysis of food or tissue lipids during 
storage and passage to its site of action. 
Due to its broad substrate specificity it has the 
potential to, on its own, mediate complete digestion of 
most dietary lipids, including the fat soluble vitamin 
25 esters. 

BSSL may be superior to pancreatic colipase-dependent 
lipase to hydrolyze ester bonds containing long- chain 
polyunsaturated fatty acids. 

In the presence of gastric lipase and in the absence of , 
30 or at low levels of colipase-dependent lipase BSSL can 
ascertain a complete triacylglycerol digestion in vitro 
even if the bile salt levels are low such as in newborn 
infants. In the presence of BSSL the end products of 
triacylglycerol digestion become free fatty acids and free 
35 glycerol rather than free fatty acids and monoacylglycerol 
generated by the other two lipases (5). This may favour 
product absorption particularly when the intraluminal bile 
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salt levels are low (3,15). 

From a historical point of view infant formulas have been 
developed, and improved, from the concept that their 
composition should be as similar to that of human milk as 
5 possible. It is desirable to supplement such formulas* 

The utilization for supplementation, substitation or 
therapy of bile salt-stimulated lipases (BSSL), or of 
proteins with the essential functions of BSSL, requires 

10 however access to quantities of the product on a large 
technical scale. It is not possible in factory scale to 
rely on natural sources such as milk as starting material. 
Besides the problem mentioned above with inactivation of 
BSSL during pasteurization, there is the additional risk 

15 of contamination of material from a natural source with 
infectious agents, e.g. vira such as HIV virus and CMV. 
There is, accordingly, a need for large scale access to 
products having BSSL properties. The present invention 
provides such products and methods for their preparation. 

20 

Prior art references are given later in this 
specification. 
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The invention 

The present invention is based on the cloning of cDNA 
coding for BSSL derived from human mammary gland. We have 
5 also isolated, from human pancreas, a partial cDNA coding 
for CEH. Deduced amino acid sequences from the human 
cDNA's and comparison with CEH from other species, support 
the interpretation that BSSL and CEH are identical* 

10 As will be further detailed below, it was surprisingly 
found that the structure of the protein as deduced from 
the cDNA sequence is quite different from the structure of 
other lipases* The structure proved unexpectedly to be 
more like. the structure of typical esterases, such as 

15 cholinesterase . 
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With reference to Figure II and Figure VII, products of 
the invention are: 

a) a protein as defined by the amino acid sequence 1-722 
5 in Fig. VII, 

b) a protein as defined by the amino acid sequence 1-535 
in Fig- VII, 

10 c) a protein as defined by the amino acid sequence 1-278 
in Fig. VII, 

d) a protein as defined by the amino acid sequence 1-341 
in Fig- VII, 

15 

e) a protein as defined by the amino acid sequence 1-409 
in Fig- VII, 

f ) a protein as defined by the amino acid sequence 1-474 
20 in Fig. VII, " 

g) combinations of proteins defined under b) - f ) e.g. 
as defined by the amino acid sequence in positions 1-278, 
279-341, 279-409, 279-474, 342-409, 342-474 and 536-722. 

25 

h) combinations of proteins defined under &) - g) in 
combination with one or more of the repeats according to 
Figure V, 

30 i) a protein as defined under a) - h) possessing an 
additional, N-terminal amino acid, namely methionine, 
and functionally equivalent variants and mutants of the 
proteins defined in a) - i) above; 

35 It should be noted that the proteins under a, b, c, d, e, 
f , h and i above will not be identical in all respects to 
naturally occurring BSSL, but they will exhibit one or 
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more of the critical functions of naturally occurring 
BSSL. Critical functions are given below. 

j) a DNA sequence coding for the proteins defined in a, 
5 b, c, d, e, f, h and i above. 

k) a DNA sequence according to Fig. II, defined by the 
following nucleotide numbers in Fig. II: 

10 

a) a DNA sequence 151-2316 according to Fig. II, coding 
for the protein defined by the amino acid sequence 1-722 
in Fig. VII, 

b) a DNA sequence 151-1755 according to Fig. II, coding 
15 for the protein defined 2>y the amino acid sequence 1-535 

in Fig. VII, 

c) a DNA sequence 151-98S according to Fig. II, coding 
for the protein defined by the amino acid sequence 1-278 
in Fig. VII, 

20 d) a DNA sequence 151-1172 according to Fig. II, coding 
for the protein defined by the amino acid sequence 1-341 
in Fig. VII, 

e) a DNA sequence 151-1376 according to Fig, II, coding 
for the protein defined by the amino acid sequence 1-409 

25 in Fig. VII, 

f ) a DNA sequence 151-1574 according to Fig. II, coding 
for the protein defined by the amino acid sequence 1-474 
in rig. VII, 

g) a DNA sequence 986-1172 according to Fig- II, coding 

30 for the protein defined by the amino acid sequence 279-341 
in Fig. VII, 

h) a DNA sequence 986-1376 according tc Fig. II, coding 
for the protein defined by the amino aci<a sequence 279- 
409 in Fig. VII, 

35 i) a DNA sequence 986-157 4 according to Fig. II, coding 

for the protein defined by the amino acid sequence 279-474 
in Fig. VII, 
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j) * DMA sequence 1173-1376 according to Fig. II, coding 
for the protein defined by the aaino acid sequence 342-409 
in Pig. VII , 

k) a DMA sequence 1173-1574 according to Pig. II, coding 
5 for the protein defined by the amino acid sequence 342-474 
in Pig. VII. 

Significant functions of proteins of the invention are 
10 a) suitable for oral administration, 

b) being activated by specific bile salts, 

c) acting as a non-specific lipase in the contents of the 
15 small intestines, that is being able to hydrolyse lipids 

relatively independent of their chemical structure and 
physical state (emulsified, mice liar, soluble). 

d) Ability to hydrolyre triacylglycerols with fatty acids 
20 of different chain-length and different degreee of 

unsaturaticn. 

e) Ability to hydrolyze also diacylglycerol, 
monoacylglycerol , cholesteryl esters, 

25 lysophosphatidyiacylglycerol, and retinyl and other fat 
soluble vitamin-esters. 

f ) Ability to hydrolyze not only the sn~l(3) ester bonds 
in a triacylglycerol but also the sn -2 ester bond. 

30 

g) Ability to interact with not only primary but also 
secondary bile salts. 

h) Dependency on bile salts for optimal activity. 

35 

i) Stability so that gastric contents will not affect the 
catalytical efficiency to any substantial degree. 
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j) Stability towards inactivation by pancreatic 
proteases, e.g. trypsin, provided bile salts are present. 

5 k) Ability to bind to heparin and heparin derivatives, 
e.g. heparan sulphate. 

1) Ability to bind to lipid-water interphases. 
10 m) Stability to permit lyophilization. 

n) Stability when mixed with food constituents such as i:„ 
human milk, or milk formula. 

15 The critical functions for supplementation, substitution, 
or therapy are these according to a), c), d), e), f), i), 
j) and 1). For other purposes, not all critical functions 
may be necessary. 

20 For expression of the proteins indicated above, the 

appropriate DNA sequence indicated above will be inserted 
into a suitable vector which then is introduced into a 
suitable host organism. The said vector will also have to 
comprise appropriate signal and other sequences enabling 

25 the organism to express the desired protein. 

Suitable expression organisms: 

With the recombinant DNA techniques it is possible to 
30 clone and express a protein of interest in a variety of 
prokaryotic and eukaryotic host organisms. Possible 
expression organisms are bacteria, simple eukaryots 
(yeast), animal cell cultures, insect cell cultures, plant 
cell cultures, plants and transgenic animals. Each 
35 individual system has its own particular advantages and 

disadvantages. The simple conclusion is that every gene to 
be expressed is a unique problem and no standard solution 
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is available. 

CociBDonly used bacterial systeas are B # Coli, Bacillus 
aubtilia, Streptomyces. Commonly used yeaata are 
5 saccharororces,, and Pichia Past or la. Coanoaly uaed aniaal 

cells are CHO cells and COS cells. Coranonly used Insect 
cell cultures are Drosophila derived cells. 

Commonly used plant is the tobacco plant. Possible 
10 transgenic animals are goat and cow* 

Possible bacteri nl vectors are exemplified by pUC and 
protein A-vectors. 

Possible yeast vector is exemplified by pMA 91. 
15 Possible insect vectors are devired from Baculo-virus . 
Possible animal cell vectors are derived from SV/40. 
Possible plant vectors are derived from the Ti-plasmid. 
In every system, both natural and synthetic promoters and 
terminators can be used. 

20 

It is understood that depending on the choice of 
expression system, the expressed protein may contain an 
additional N-terminal amino acid (methionine), contain a 
few extra amino acids, or be fused to a heterologous 

25 protein, (e.g. protein A), or differ from the naturally 
occurring protein with respect to glycosylation. 
Furthermore, the vectors may also contain signal sequences 
in order to export the protein to the periplasm or to the 
culture medium. 

30 Thus, further aspects of the invention are: 

a) a vector comprising a DNA sequence coding for a 
protein as specified above, 

b) a host organism comprising a DNA sequence as 
specified above, 

35 c) a process for the production of a protein as 
specified above, by growing a host organism 
containing a vector as specified under a) above and 
isolating the protein. 
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Methods from purification are based on the expression 
system used (e.g. protein A/IgG) and/ or on methods used 
for purification of the naturally occurring enzyme, as 
described in reference 6, 

5 

Additional aspects of the invention are: 

- a pharmaceutical composition comprising a protein as 
specified above, 

10 

- the use of a protein as specified above for the 
manufacture of a medicament for the treatment of a 
pathological condition related to exocrine pancreatic 
insufficiency, 

15 

- the use of a protein as specified above for the 
manufacture of a medicament for the treatment of cystic 
fibrosis, 

20 - the use of a protein as specified above as a supplement 
to an infant food formulation, 

-the use of a protein as specified above for the 
manufacture of a medicament for the treatment of chronic 
25 pancreatitis, 

-the use of a protein as specified above for the : 
manufacture of a medicament for the treatment of fat 
malabsorption of any etiology, 

30 

-the use of a protein as specified above for the 
manufacture of a medicament for the treatment of 
malabsopt.ion of fat soluble vitamins, 

35 -the use cf a protein as specified above f oar the 

manufacture of a medicament for the treatment of fat 
malabsorption due to physiological reasons, e.g. in new- 
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born infants* 



The DMA sequence in Fig, II from position 151 up to and 
5 including position 2316 is the sequence coding for the entire 

protein., The sequence from position 2317 up to and including 
position 2415 is not translated to protein, but is included 
in exon d identified in Table 2 below, 

10 In one embodiment of the invention, the protein as defined in 
pharagraphs a) - i) above is provided in isolated form and/ or 
, in substantially pure form. 

The DNA sequences as defined in paragraphs a) - k) above are 
^5 in one embodiment of the invention provided in isolated form 
and/or in substantially pure form 

Experimental part 

20 * ^kkreviations 

aa, amino acid; bp, base pair; BSSL, bile salt-stimulated 
lipase; c-AMP, cyclic adenosine monophosphate; CEH, 
carboxylic ester hydrolase; Da, dalton; c **GTP, 7-deaza-?.- 
deoxyguanosine 5 ! triphosphate; EDTA, ethylene diamine 

25 tetraacetate; kb, kilobases; MOPS, 3-N-morpholino- 

propanesulf onic *uid; nt r nucleotide; PAGE, polyacrylamide 
gel electrophoresis; SDS, sodium dodecyl sulfite; SSC, NaCi 
citrate, xGal, 5-bromo-4-chloro~3~indolyl-£-D- 
galactopyranoside . 

30 

Enzymes 

Bile salt-stimulated lipase EC 3.1.1.3 
Carboxylic ester hydrolase EC 3.1.1.1 



35 
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Material and Methods 

A. Enzyme and antibody preparation 

5 BSSL was purified from human milk as previously described 
[6]. When used for antibody production the enzyme was 
further purified by SDS-PAGE. The protein band 
corresponding to the lipase was, after staining with 
Coomassie Brilliant blue, electroeluted from the gel* 

10 Twentyfive ug of purified enzyme, together with an equal 
volume of Freund's complete adjuvant, was used for a first 
i.e. injection and the same amount of enzyme with 
incomplete adjuvant for the subsequent monthly booster 
injections. The rabbits were bled about two weeks after 

15 each booster and sera prepared and stored at -20 ^C. 

B. Preparation of tryptic fragments and amino acid 
sequence analysis 

20 Three mg of purified BSSL was dissolved in 1 ml of 0.1M 
Tris-Cl buffer , pH 8.5, containing 6M guanidinium 
hydrochloride and 2 mM EDTA. Dithioerythritol was added to 
5mM. After incubation at 37 ~C for 2h, 300 \il 50 raM 
iodoacetate was added* After 90 inin incubation at 25 0 C in 

25 darkness the reduced and carboxymethylated enzyme was 
desalted on a Sephades G-25 column, equilibrated with 
0.5M ammonium bicarbonate ♦ Thirty \ig of tosyl-L- 
phenylalanine chloromethane treated bovine trypsin 
(Worthington diagnostics system Inc., Freehold., NJ, USA) 

30 was added before lyophilization. The lyophilized protein 
was dissolved in 4 ml 0.1M ammonium bicarbonate and an 
additional 90 ug of trypsin was added* After 5h incubation 
at 37 °C the protein was again lyophilized. The tryptic 
digest was dissolved in 0*1% trif luoroacetic acid 

35 (2mg/ml). Three hundred jig of trypsinated BSSL was 

chromatographed on HPLC using a C-18 reversed phase column 
and eluted with a gradient of 0-50% acetonitrilc in 0.1% 
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trif luoroacetic acid. Peptide collection was monitored by 
continuous recording of the absorbance at 215 nro. Peptides 
to be sequenced were further purified by rechroroatography 
using the same column with adjusted gradients. 
5 Samples of peptide fragments to be sequenced were dried 
under nitrogen to remove acetonitrile and applied to the 
sequencer. For N- terminal sequence analysis, native BSSL 
was dissolved in 0.1% acetic acid. Sequence analyses were 
performed on an Applied Biosystems Inc. 477A pulsed 
10 liquid-phase sequencer and on-line PTH 120A analyzer with 
regular cycle programs and chemicals from the 
manufacturer* Initial and repetitive yields, calculated 
from a sequenced standard protein, S-lactoglobulin, were 
47 and 97%, respectively. 

15 

C. Isolation of RNA 

Samples of human pancreatic adipose and lactating mammary 
gland tissues were obtained at surgery and immediately put 
20 into guanidinium thiocyanate (1-5 g in 50 ml). Total RNA 
was extracted as described by Chirgwin [20]. Poly (A) -RNA 
was prepared by chromatography on oligo-deoxytbymidilate- 
(oligo(DT) ) -cellulose column [21]. 

25 D. Construction and screening of cDHA libraries 

Approximately 15 ug poly-adenylated RNA from human 
pancreas was denatured with methyl mercuric hydroxide [22] 
and primed with oligo (dT) ^-^a primers (Pharmacia, 

30 Uppsala, Sweden), and reversely transcribed using 

standard procedures [23]. Second-strand synthesis was 
carried out according to Gubler and Hoffman [24], except 
that DNA ligase and B-NAD were omitted, and the reaction 
temperature was set at 15 °C. Excess RNA was digested with 

35 RNAse A (50ug/ml), and the doutie- stranded cDNA was 

treated with EcoRI methylase [25]. Ends were blunted with 
Klenow enzyme. After ligation to &coRI linkers and 
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cleavage with KcoRI the cDHA was fractionated on a 
Sepharose 4B-C1 column. The void volume fraction was 
precipitated with ethanol and the cDNA ligated into the 
EcoRI site of a phosphatase treated gtll vector [26]. In 
5 vitro packing yielded more than 7x10* recombinants. 

A cDNA library from human mammary gland , derived from 
tissue obtained from a women at the eighth month of 
pregnancy , was purchased (Clontech Laboratories, Inc., 
10 Palo Alto, CA? USA). 

Phages from the cDNA libraries were plated at 5x10* plaque 
forming units per 120 -mm dish. The antiserum was diluted 
to a ratio of 1:3200 and screening was performed according 

15 to Young and Davi * [27]. Alkaline-phosphatase-conjugated 
goat- anti- rabbit antibodies were used as second antibodies 
(Bio-Rad, Richmond, CA USA). To isolate clones 
corresponding to the 5' -end of the mRNA, nucleic acid 
hybridization was done under standard conditions [23 J 

20 using a subcloned fragment from one of the iwrounopositive 
clones as a probe. 

E. RNA analysis 

25 Electrophoresis was carried out in a 1% agarose gel in 
40mM MOPS buffer pH 7.0 after denaturation with glyoxal 
and dime thy lsulf oxide [28]. Glyoxalated total RNA was then 
transferred to nitrocellulose filters [29]. The blots wore 
probed with subclones of BSSL and CEH recombinants that 

30 were labelled by the oligo-labeling technique [30]. 

Prehybridi2ation and hybridization were carried out with 
50% formamid at 46 °C [23]. Posthybridization washes wera 
performed at high stringency (0.1% SDS and O.lxSSC at 
60~ C). (lxSSC, 0,15M NaCl, 0.0015M Na 3 citrate, pH 7.5). 



F. Nucleotide sequence 
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cDNA inserts from BSSL and CKH recombinants were either 
directly cloned into M3mpl8 and mpl9 after sonication and 
sise fractionation or some of them were further subcloned 
into pTZ!9R after digestions with PstI, BstXI, Narl, Smal 
5 and Ahall. The nucleotide sequence was determined by the 
dideoxy chain- termination method [31]. The GC-rich repeats 
(see below) were also sequenced with TaqI polymerase and 
dc "'GTP. Both strands were sequenced. Sequence information 
was retrived from autoradiograms by use of the software 
10 MS-EdSeq as described by Sjoberg et al [55]. 

G. Amino acid sequence predictions and homologies. 

Tc predict the corresponding amino acid sequence of the 
15 cDNA inserts, codon usage of different reading frames was 
compared according to staden and gave one open reading 
frame [321* Homologies were searched for with the programs 
of the UWGCG software package [33]. 

20 Results and Discussion 

A. Sequences of tryptic fragments and the N- terminu s of 

BSSL 

25 Trypsin digestion of purified BSSL resulted in 

approximately 50 fragments as judged by the number of 
peaks obtained during the HPLC-chromatography (Fig. I), 
•Phe peaks were collected and the indicated peaks which 
could be isolated in a highly purified state and in 

30 reasonable quantities, were sequenced. The resulting 

sequences are shown in Table I- In addition the 30 most N- 
terminal residues were sequenced (Fig. II) , and they 
confirm the previously reported sequence of Abouakil et 
al. O0 residues) [17] . 

35 

The 22 residues long N- terminal sequence reported from 
Wang & Johnson [34] is a glycine in our report and a 
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lysine in Wang & Johnson's. 
B. Nucleotide sequence of BSSL 

5 For construction of the Xgtll cDWfA library we used 

polyadenylated RNA from human pancreas. Initially four 
immunopositive clones were isolated, and then this 
pancreatic expression cDNA library was screened with 
antiserum against BSSL. Nucleotide sequence analysis of 
10 the four clones showed that they are in perfect agreement 
and correspond to the 3' -end of the mRNA. They all begin 
with a poly A tail and differ only in length; the longest 
insert, designated ACEH, spans 996 bp. 

A cDNA library from human mammary gland was screened with 

15 antiserum, and the pancreas clone ACEH as probe. Positive 
clones were isolated from both screenings, which all 
originate from the 3 '-end. The longest mammary gland 
clone, designated ABSSL, reaches 2100 bp upstream. It 
contains four of the sequenced tryptic fragments (Fig. 

20 II), but do not include the N- terminal amino acid 

sequence. To extend the sequence beyond the translation 
start, the mammary gland cDNA library was rescreened with 
a 118 bases long probe derived from the roost 5* proximal 
part of ABSSL. One clone was isolated that continued a 

25 further 328 nucl /rides upstream. It matched the N~ 

terminal amino acid sequence, and contained the remaining 
tryptic fragment. As shown in Fig. II, the cDNA is 2428 
nucleotides long and contains 81 bases upstream from the 
first ATG codon. The polyadenylation signal, AATAAA is 

30 located 13 nucleotides upstream from the poly A tail and 
the termination codon TAG was found at nucleotide 2317 
followed by a 3 ' -untranslated region of 112 bp. 
A GC rich region consisting of 16 repeats of 33 bases was 
found in the 3 '-end of the= sequence between base 1756 and 

35 2283. The nucleotide sequence of the repetition, shown 
in Fig. Ill, consists of six identical repetitions 
surrounded by ten repetitions with different number of 
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substitutions that have probably occurred after several 
duplications. The low number of substitutions suggests 
that these repetitions have appeared late during 
evolution - 

5 

C. Tissue distribution of expression 

RNA from human lactating mammary gland, pancreas, adipose 
tissue and from a human hepatoma celline (HepG2) was 

10 analyzed by Northern blotting. The size of the messenger 
was determined to be approximately 2,5 kb in both 
lactating mammary gland and pancreas. No signal could be 
detected in che lanes with RNA extracted from HepG2 or 
adipose tissue (Fig. IV). 

15 Since the mRNA used for the manmary gland library was 

obtained from a female in her 8th month of pregnancy, it 
is evident that transcription and probably translation of 
the BSSL gene is turned on before partus, in agreement 
with previous findings on BSSL secretion before partus 

20 [35]. See Figure IV. 

D. Amino acid sequence of BSSL 

Assessed by SDS-PAGE the molecular mass has been reported 

25 to be 107-125 kDa [8,36] and by analytical 

ultracentrifugation to be 105 kDa [37]. The enzyme, as 
deduceo from the cDNA, consists of 722 amino acid residues 
(Fig. II) which, giving a molecular mass of 76.271 Da, 
indicates that the enzyme contains at least 15-20% 

30 carbohydrate. The leader sequence is 23 residues long. A 
tentative active site serine residue is localized to 
serine-217 (Fig. V) . The sequence around this serine 
accord with the consensus active site sequence of serine- 
hydrolases [38 J. It has recently been it was proposed that 

35 basic residues found close to the active site serine may 
be involved in the cleavage of ester bonds in acylglycerols 
by lipases [39]. It is interesting to note that such 
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residues are not present in BSSL. The single tentative N- 
glycosylation site is localized only seven residues from 
the serine. The degree of glycosylation [6,16] suggests 
that the enzyme contains 0- linked carbolydr a te . There are 
5 numerous sites where such glycosylation could have 

occurred. The amino acid composition based on purified 
enzyme has shown a high content of proline residues [ 6 3 . 
The amino acid sequence obtained from cDNA confirms this. 
Moreover, most of the proline residues are localized in 
10 the 16 repeats of 11 residues each, constituting the main 
part of the C-terminal half of the enzyme. 

E. Comparison of the enzynes in mammary gland (BSSL) and 
pancreas (CKH) 

15 

BSSL of human milk and human pancreas CEH have previously 
been shown to be similar, if not identical. The present 
data strongly suggests that the two enzymes are products 
of the same gene* The nucleotide sequence of the cDNA 

20 clones shows that the pancreatic clone ACEH is identical 
with the mammary gland clone ABSSL from the poly A tail 
and 996 bases towards the 5 f -end, including the sequence 
coding for the proline rich repeats. Northen blot gave a 
single band of 2.5 kb in RNA from pancreas and lactating 

25 mammary gland (Fig. IV). Genomic Southern blots further 
support the idea that only one gene codes for BSSL and 
CEH. The difference in mobility on SDS-PAGE between BSSL 
and CEH can be explained as a consequence of different 
glycosylation or differential splicing. 

30 

The similarity of BSSL to the rat and bovine enzymes (see 
below) and to results from genomics blots support the 
possibility that differential splicing cannot account for 
the mobility difference. Since the C-terminal sequence has 
3 5 not been confirmed on the protein level there is a less 
likely possibility that CEH may be processed by a 
proteolytic clearage in the C-terminal end. 
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So far as we know pancreatic enzymes that obviously 
correspond to CEH have often been named after species and 
the particular substrates used to determine their 
respective activities; lysophospholipase f cholesteryl 
5 esterase, sterol ester hydrolase, non-specific lipase, 
carboxyl ester lipase and cholesteryl ester hydrolase* 
Available data are compatible with the view that all these 
activities described originates in one and the same 
functional entity [42,43]. This illustrates the broad 

10 substrate specificity and the relevance of designating 
them as non-specific lipases. When the sequence of human 
BSSL/CEH is compared to the sequence of lysophospholipase 
from fat pancreas [40] and cholesterol esterase from 
bovine pancreas [41] extensive similarities are found that 

15 extend about 530 residues from the N-terminal (Fig V) ; but 
they differ in the part of the molecule where the repeat 
occur. The rat enzyme has only four repeats and the bovi-n^ 
three* Hence the human enzyme is a considerable longer 
peptide . 

20 

Moreover, striking similarities were found between BSSL 
and a number of typical esterases, e*g« acetyl choline 
esterases from several species, including man and 
Drosophila, and carboxyl esterases (Fig. VI). These 

25 similarities were restricted to the N-terminal 300 

residues of BSSL which includes the tentative active site 
serine-residue. A similarity to acetyl choline esterase 
has been predicted from the fact that BSSL is inhibited by 
typical choline esterase inhibitors [6, 8, 16]. With the 

30 possible exception of the rat liver carboxyl esterase 

[45), none of these similar enzymes has been shown to have 
the same bile-salt dependency as BSSL; this suggests that 
the structural basis for this property resides in the c- 
terminal part of the protein* Moreover , BSSL can 

35 efficiently attack emulsified substrates which is not a 
known characteristic of the similar esterases. For this 
activity bile salt is a prerequisite. 
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The predicted sequence for human BSSL was compared with 
other well characterized mammalian lipases. Apart from the 
consensus sequence around the active site serine (G-X-S-X- 
5 G) , no obvious similarities were found [44], 

In addition to the similarities with other enzymes, there 
also significant similarities to one c^AMP dependent 
protein from Dictyostelium discoideum [461 as well as to 

10 thyroglobulin from several species (Fig VI) [47-49], The 
similarities between BSSL and thyroglobulin, which 
comprise the active site region but not the active site 
itself, indicate that these highly concerved stretches of 
amino acids are of more generalized importance than merely 

15 supporting the enzymatic activity of esterases. 

In conclusion, human milk BSSL consists of 722 amino acid 
residues. Available data strongly indicate that its 
peptide chain is identical to that of pancreatic CEH, and 

20 they are coded for by the same gene. The strongest 

evidence is that the nucleotide sequences of their 3 - 
ends and their N-terminal amino acid sequences are 
identical. The striking homologies found to rat pancreatic 
lysophospholipase and bovine pancreatic cholesterol 

25 esterase support the hypothesis that also these enzymes 
are functionally identical. However , as it has been 
suggested, the different molecular sizes found among 
species are not due to differences merely in 
glycosylation; instead they reflect a variable number of 

30 an eleven amino acid repeat. The similarity of the active 
site sequence between these esterases suggests that these 
proteins derive from a common ancestral gene. 

35 With reference to Figures I-VII, the following legends are 
given. 
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Figure I: Separation of the trvptic digest of BSSL on HPLC 

purified BSSL was treated with trypsin and chromatographed 
on HPLC as described in Materials and Methods. The 
5 indicated peaks were collected and purified further by a 
rechromatograph and their amino acid sequence determined. 

Figure II; The cDNA nucleotide seguence and the deduced 
amino acid seguence for human bile salt-stimulated lipase: 

10 

The cDNA is 2428 bases long* The N-terminal 23-codon 
sequence (nt82-150) starting with an ATG, is interpreted 
as a leader peptide since the N-terminal amino acid 
sequence of the mature protein starts at codon 24 (nt 151, 
15 Ala)* The leader peptide is underlined. The sign * 
indicates the starting point of an exon. The sign # 
indicates the starting point of the repetition part. 

Figure III: The nucleotide seguence of the Oterminal GC- 
20 rich repetitions in the bile salt- stimulated lipase: 

Substitutions are indicated by a *. 

Figure IV: Northern blot hybridization 

25 

Northern blot analysis of total RNA isolated from human 
lactating mammary gland, pancreas, adipose tissue and a 
human hepatoma cell line ( HepG2 ) ♦ Total TNA (10yg) from 
lactating mammary gland (lane A), pancreas (lane B) , 

30 adipose tissue (lane C) and HepG2 (lane D) were 

electrophoresed in a 1% agarose gel in 40mM MOPS buffer at 
Ph 7.0 after denaturation of RNA in 1M glyoxal, 50% 
dimethylsulf oxide and 40mtt MOPS- The glyoxalated RNA was 
then transferred to nitrocellulose paper for hybridization 

35 with [" P] labeled BSSL cDNA (ABSSL). 
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zn r <>r+ y. comparison of the deduced amino aold sequence 

K»m a n milk BSS L. rat pancreatic lysophospholipase 
(RatlPl) bovine pan— ^hnUtwol esterase 

^Bftveeh) [411: 

5 

The serine residues involved in the active site are 
indicated by a *, and the I indicates the single possible 
N-glycosylation signal of the protein. The direct repeats 
of amino acid sequences are boxed. Matching sequences are 
10 denoted in capital letters, matching sequences between two 
enzymes are denoted in small letters and mismatching with 
a dot, 

g< T ,r> vt : - comparison of the p rimary structure of BSSL to 
15 Z1 ^ 1Z ^^ ^^. thvrog l obuline and to on? c-AMP dependent 
nnzr^-lrr- m rH«t Y Q5telium discoidemn: 

BSSL: bile salt stimulated lipase from human, Cheshum: 
cholinesterase from fetal human tissue [50], Torpace: 
20 acetylcholinesterase from Torpedo marmorata [51], 
Drosceh: carboxylic ester hydrolase from Drosophila 
melaogaster [52], Ratlivce: carboxyl esterase from rat 
liver [53], Drosace: acetylcholinesterase from Drosophila 
melaogaster [54], Thyrhum: thyroglobulin from human [49] 
25 and Dict.Di: c-AMP dependent enzyme from Dict/ostelium 
discoideum [46]. There are 7 different domains that show 
similarities between the enzy™- Boxes enclose residues 
which are identical and small letters in the consensus 
sequence indicate identical residues in all the enzymes 
3 0 except for one. Dots indicate mismatches. The serine 

residue involved in the active site is indicated with *. 
The figure in the right hand comer shows how the domaxns 
are oriented. 
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Figure VII: 

gives the amino acid sequence 1-722 for the entire 
protein (one letter code) and indicates exons a, b, c, 
5 and d. The sign ft indicates the starting point of the 
repetition part* 
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Table 1 . Amino acid sequence of BSSL peptides* 

Due to interfering peaks no positive identification of the 
residue in cycles 1 and 2 of the sequencing could be made 
5 in peptide number 26. The peptide numbers refer to the 
peaks in Fig I . 



10 

Tryptic fragments 



15 TP16 : LysValThrGluGluAspPheTyrLys 

TP1 9 : Gly I lePr oPheAlaAlaProThrLys 
Tp20 : LeuValSerGluPheThrlleThrLys 

20 

TP24 : ThrTyrAlaTyrLeuPheSerHisProSer Arg 
TP26 : PheAspValTyrThrGluSerTrpAlaGlnAsp 
25 ProSerGlnGluAsnLys 
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Table 2: Identification of the axon* >, b, e and d 
numbered aa in Figures II and VII . 

5 



10 



15 



20 



25 



exon 


Location 






between nucleotide 


number between aminoacid 
number 


a 


986- 


-1172 


279- 


-341 


b 


1173- 


-1376 


342- 


-409 


c 


1377- 


-1574 


410- 


■474 


d 


1575- 


-2415 


475- 


•722 


the entire 


151- 


-2316 


1- 


■722 


protein 










the entire 


151- 


■1755 


1-535 


protein 










excluding 










repetitions 
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What we claia is: 

1. A protein as indicated in Figure VII from position 1 
5 to position 722 • 

2. A protein as indicated in Figure VII from position 1 
to position 535. 

10 3, A protein as indicated in Figure VII from position 1 
to position 278. 

4. A protein as indicated in Figure VII from position 1 
to position 341. 

15 

5. A protein as indicated in Figure VII from position 1 
to position 409. 

6. A protein as indicated in Figure VII from position 1 
20 to position 474 • 

7. A protein as indicated in Figure VII from position 1 
to position 278, position 279 to position 341, 
position 279 to position 409, position 279 to 

25 position 474, position 342 to position 409, position 

342 to position 474, or position 536 to position 722. 

8. A protein according to any of claims 1-7 in 
combination with one or more cf the repeats according 

30 to Figure V. 

9. A protein according to any of claims 1-8, possessing 
methionine as additional N-terminal amino acid. 



35 



10. 



A protein according to any of claims 1-9, exhibiting 
one or several of the critical functions of 
naturally occurring bile salt-stimulated lipase. 
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11. A functionally equivalent variant or Mutant of a 
protein according to any of claims l-§. 

12. A DMA sequence coding for a protein with the amino 
S acid sequence according to claim 1. 

13. A DMA sequence coding for a protein with the amino 
acid sequence according to claim 2. 

10 14. A DHA sequence coding for a protein with the amino 
acid sequence according to any of claims 3, 4, 5, 6, 
7, 8, 9, 10 and 11. 

15. A DHA sequence defined by the following nucleotide 
15 numbers in Figure lis 

151-2316 

151-1755 

151-985 
20 151-1172 

151-1376 

151-1574 

986-1172 

986-1376 
25 986-1574 

1173-1376 

1173-1574 



30 16. A vector comprising a DNA sequence coding for a 
protein according to claims 1-11 

17. A vector comprising a DHA sequence according to 
claims 12 - 15. 

35 

18. A host organism transformed with a vector according 
to claims 16 or 11 . 
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19. A process for the production of * protein according 
to claims 1 - 11 by growing a host organise 
containing a vector according to claims 16 or 17, and 

5 isolating the protein. 

20. A pharmaceutical composition comprising a protein 
according to any of claims 1-11. 

10 21. A pharmaceutical composition comprising a protein 

according to any of claims 1-11 in combination with 
a lipase or in combination with preparations 
containing lipases. 

15 22* The use of a protein according to any of claims 1 - 
11, for the manufacture of a medicament for the 
treatment of a pathological condition related to 
exocrine pancreatic insufficiency. 

20 23, The use of a protein according to any of claims 

1-11 for the manufacture of a medicament for the 
treatment of cystic fibrosis. 

24. The use of a protein according to any of claims 

25 1-11 as a supplement to an infant food formulation. 

25. The use of a protein ac-w^'ng to any of claims 1-11 
for the manufacture of a medicament for the 
treatment of chronic pancreatitis. 

30 

26. The use of a protein according to any of claims 1-11 
for the manufacture of a medicament for the 
treatment of fat malabsorption, 

35 27. The use of a protein accordin to any of claims 1-11 
for the manufacture of a medicament for the 
treatment of malabsorption of fat soluble vitamins* 
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28. The use of a protein according to any of claims 1-11 
for the manufacture of a medicament for the 
treatment of fat malabsorption due to physiological 
reasons. 

S 

29. The use, according to claims 22-28 of a protein 
according to any of claims 1-11 in combination with a 
lipase or lipases or in combination with preparations 
containing a lipase or lipases. 

10 

30. An infant food formulation, supplemented with a protein 
according to any of claims 1-11. 

31. A protein according to claims 1 - 11 in substantially 
15 pure form. 

32. A protein according to claims 1 - 11 in isolated form. 

33. A DNA sequence according to claims 12 - IS in 
20 substantially pure form. 

34. A DNA sequence according to claims 12 - 15 in isolated 
form. 

25 35. A process for the preparation of a pharmaceutical 
composition according to claims 20 or 21 by 
incorporating a protein according to claims 1 - 11, 31, 
or 32 in a pharmacologically acceptable carrier. 

30 36. A process according to claim 35 for the preparation of 
a pharmaceutical composition for oral administration. 

37. A process for the preparation of an infant food 

formulation according to claim 30, by supplementing an 
3S infant food formulation with a protein according to 

claims 1 -11, 31, or 32. 
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Fig.2(l/U) 



ACCITCTGTA TCAGTTAAGT GTCAAGA.TCG AAGGAACAGC AGTCTCAAGA TAATGCAAAG 60 

AGTITATTCA TCCAGAGGCT G ATG CTC ACC ATG GGG CGC CTG GAA CTC GIT 111 

Met Leu Thr Met Glv Arg Leu Gin Leu Vai 
15 10 

GTG TIG GGC CTC ACC TGC TGC TGG GCA GTG GCG ACT. GCC GOG AAG CTG 159 
Val Leu Glv Leu Thr Cys Cys Trp Ala Val Ala Ser Ala Ala Lys Leu 
15 20 25 

GGC GCC GTG TAC ACA GAA GGT GGG TTC GTG GAA GGC GTC AAT AAG AAG 207 
Gly Ala Val Tyr Thr Glu Gly Gly Phe Val Glu Gly Val Asn Lys Lys 
30 35 40 

CTC GGC CTC CTG GGT GAC TCT GTG GAC ATC TTC AAG GGC ATC CCC TTC 255 
Leu Gly Leu Leu Gly Asp Ser Val Asp lie Phe Lys Gly lie Pro Fhe 
45 50 55 

GCA GCT CCC ACC AAG GCC CTG GAA AAT CCT CAG CCA CAT CCT GGC TGG 303 
Ala Ala Pro Thr Lys Ala Leu Glu Asn Pro Gin Pro His Pro Gly Trp 
60 65 70 

CAA GGG ACC CTG AAG GCC AAG AAC TTC AAG AAG AGA TGC CTG CAG GCC 351 
Gin Gly Thr Leu Lys Ala Lys Asn Fhe Lys Lys Arg Cys Leu Gin Ala 
75 80 85 90 

ACC ATC ACC CAG GAC AGC ACC TAC GGG GAT GAA GAC TGC CTG TAC CTC 399 
Thr lie Thr Gin Asp Ser Thr Tyr Gly Asp Glu Asp Cys Leu Tyr Leu 
95 100 105 

AAC ATT TGG GTG CCC CAG GGC AGG AAG CAA GTC TCC OGG GAC CTG CCC 447 
Asn He Trp Val Pro Gin Gly Arg Lys Gin Val Ser Arg Asp Leu Pro 
110 115 120 

GTT ATG ATC TGG ATC TAT GGA GGC GCC TTC CTC ATG GGG TCC GGC CAT 495 
Val Met He Trp He Tyr Gly Gly Ala Phe Leu Met Gly Ser Gly His 
125 130 135 

GGG GCC AAC TTC CTC AAC AAC TAC CTG TAT GAC GGC GAG GAG ATC GCC 543 
Gly Ala Asn Fhe Leu Asn Asn Tyr Leu Tyr Asp Gly Glu Glu He Ala 
140 145 150 

ACA CGC GGA AAC GTC ATC GTG GTC ACC TTC AAC TAC CGT GTC GGC CCC 591 
Thr Arg Gly Asn Val He Val Val Thr Phe Asn Tyr Arg Val Gly Pro 
155 160 165 170 

CTT GGG TTC CTC AGC ACT GGG GAC GCC AAT CTG CCA GCT AAC TAT GGC 639 
Leu Gly Fhe Leu Ser Thr Gly Asp Ala Asn Leu Pro Gly Asn Tyr Gly 
175 ISO 185 
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CTT CGG GAT CAG CAC ATG GCC ATT GCT TCG GTC AAG AGG AAT ATC GOG 687 
Leu Arg Asp Gin His Met Ala He Ala Trp Val Lys Arg Asn He Ala 
190 195 200 

GCC TTC GGG GGG GAC CCC AAC AAC ATC ACG CTC TIC GGG GAG TCT GOT 735 
Ala Phe Gly Gly Asp Pro Asn Asn He Thr Leu Phe Gly Glu Ser Ala 
205 210 215 

GGA GGT GCC AGC GTC TOT CTG CAG ACC CTC TCC CCC TAC AAC AAG GGC 783 
Gly Gly Ala Ser Val Ser Leu Gin Thr Leu Ser Pro Tyr Asn Lys Gly 
220 225 230 

CTC ATC CGG GGA GCC ATC AGC CAG AGC GGC GTG GCC CTG ACT CCC TGG 831 
Leu He Arg Arg Ala He Ser Gin Ser Gly Val Ala Leu Ser Pro Trp 
235 240 245 250 

GTC ATC CAG AAA AAC CCA CTC TTC TCG CCC AAA AAG GTG GOT GAG AAG 879 
Val He Gin Lys Asn Pro Leu Hie a .p Ala Lys lys Val Ala Glu Lys 
255 250 265 

GTG GGT TGC CCT GTG GCT GAT GCC GCC AGG ATG GCC CAG TCT CTG AAG 927 
Val Gly Cys Pro Val Gly Asp Ala Ala Arg Met Ala Gin Cys Leu Lys 
270 275 280 

CTT ACT CAT CCC CGA GCC CTG ACG CTG GCC TAT AAG GTG CCG CTG GCA 975 
Val Thr His Pro Arg Ala Leu Thr Leu Ala Tyr Lys Val Pro Leu Ala 
285 290 295 

* 

GGC CTG GAG TAC CCC ATG CIG CAC TAT GTG GGC TTC GTC CCT GTC ATT 1023 
Gly Leu Glu Tyr Pro Met Leu His Tyr Val Gly Phe Val Pro Val He 
300 305 310 

GAT GGA GAC TTC ATC CCC GCT GAC COG ATC AAC CTG TAC GCC AAC GCC 1071 
Asp Gly Asp Phe He Pro Ala Asp Pro He Asn Leu Tyr Ala Asn Ala 
315 320 325 330 

GCC GAC ATC GAC TAT ATA GCA GGC ACC AAC AAC ATG GAC GGC CAC ATC 1119 
Ala Asp He Asp Tyr He Ala Gly Thr Asn Asn Met Asp Gly His He 
335 340 345 

TTC GCC AGC ATC GAC ATG CCT GCC ATC AAC AAG GGC AAC AAG AAA GTC 1167 
Hie Ala Ser He Asp Met Pro Ala He Asn Lys Gly Asn Lys Lys Val 
350 355 360 

ACG GAG GAG GAC TTC TAC AAG CIG GTC ACT GAG TTC ACA ATC ACC AAG 1215 
Thr Glu Glu Asp Phe Tyr Lys Leu Val Ser Glu Phe Thr He Thr Lys 
365 370 375 

GGG CTC AGA GGC GCC AAG ACG ACC TTT GAT GTC TAC ACC GAG TCC TGG 1263 
Gly Leu Arg Gly Ala Lys Thr Thr Phe Asp Val Tyr Thr Glu Ser Trp 
380 385 390 
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Fig. 2 (3/4) 

GCC CAG GAC CCA TCC GAG GAG AAT AAG AAG AAG ACT GTG GTG GAC ITT 1311 
Ala Gin Asp Pro Ser Gin Glu Asn Lys Lys Lys Thr Val Val Asp Fhe 
395 400 405 4.10 

GAG ACC GAT GTC CTC TTC CTG GTG CCC ACC GAG ATT GCC CTA GCC CAG 1359 
Glu Thr Asp Val Lau Phe Leu Val Pro Thr Glu He Ala Leu Ala Gin 
415 420 425 

CAC AGA GCC AAT GCC AAG ACT GCC AAG ACC TAG GCC TAG CTG TTT TCC 1407 
His Arg Ala Asn Ala Lys Ser Ala Lys Thr Tyr Ala Tyr Leu Phe Ser 
430 435 440 

CAT CCC TCP CGG ATG CCC GTC TAC CCC AAA TGG GTG GGG GCC GAC CAT 1455 
His Pro Ser Arg Met Pro Val Tyr L Y S ' I *P Val Gl Y ^ a His 
445 450 * 455 

GCA GAT GAC ATT CAG TAC GIT TIC GGG AAG CCC TTC GCC ACC CCC ACG 1503 
Ala Asp Asp He Gin Tyr Val Phe Gly Lys Pro Phe Ala Thr Pro Thr 
460 465 470 

GGC TAC CGG CCC CAA GAC AGG ACA GTC TOT AAG GCC ATG ATC GCC TAC 1551 
Gly Tyr Arg Pro Gin Asp Arg Thr Val Ser Lys Ala Met He Ala Tyr 
475 480 485 490 

* 

TGG ACC AAC TTT GCC AAA ACA GGG GAC CCC AAC ATG GGC GAC TCG GOT 1599 
Trp Thr Asn Phe Ala Lys Thr Gly Asp Pro Asn Met Gly Asp Ser Ala 
495 500 505 

GTG CCC ACA CAC TGG GAA CCC TAC ACT ACG GAA AAC AGC GGC TAC CTG 1647 
Val Pro Thr His Trp Glu Pro Tyr Thr Thr Glu Asn Ser Gly T^nr Leu 
510 515 520 

GAG ATC ACC AAG AAG ATG GGC AGC AGC TCC ATG AAG CGG AGC CTG AGA 1695 
Glu He Thr Lys Lys Met Gly Ser Ser Ser Met Lys Arg Ser Leu Arg 
525 530 535 

ACC AAC TTC CTG CGC TAC TGG ACC CTC ACC TAT CTG GCG CTG CCC ACA 1743 
Thr Asn Phe Leu Arg Tyr Trp Thr Leu Thr Tyr Leu Ala Lsu Pro Thr 
540 545 550 

# 

GTG ACC GAC CAG GAG GCC ACC CCT GTG CCC CCC ACA GGG GAC TCC GAG 1791 
Val Thr Asp Gin Glu Ala Thr Pro Val Pro Pro Tnr Gly Asp Ser Glu 
555 560 565 570 

GCC ACT CCC GTG CCC CCC ACG GGT GAC TCC GAG ACC GCC CCC GTG CCG 1839 
Ala Thr Pro Val Pro Pro Thr Gly Asp Ser Glu Thr Ala Pro Val Pro 
575 580 585 

CCC ACG GGT GAC TCC GGG GCC CCC CCC GTG CCG CCC ACG GGT GAC TCC 1887 
Pro Thr Gly Asp Ser Gly Ala Pro Pro Val Pro Pro Thr Gly Asp Ser 
590 595 600 
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GGG GCC CCC CCC TTG CCG CCC AOG GOT GAC TCC GGG GCC CCC CCC GTG 1935 
Gly Ala Pro Pro Leu Pro Pro Thr Gly Asp Ser Gly Ala Pro Pro Val 
605 610 615 

COG CCC AOG GGT GAC TCC GGG GCC CCC CCC GTG CCG CCC AOG GCT GAC 1983 
Pro Pro Thr Gly Asp Ser Gly Ala Pro Pro Val Pro Pro Thr Gly Asp 
620 625 630 

TCC GGG GCC CCC CCC GIG CCG CCC AOG GCT GAC TCC GGG GCC CCC CCC 2031 
Ser Gly Ala Pro" Pro Val Pro Pro Thr Gly Asp Ser Gly Ma Pro Pro 
635 640 645 650 

GTG COG CCC AOG GGT GAC TCC GGC GCC CCC CCC GTG COG CCC AOG GGT 2079 
Val Pro Pro Thr Gly Asp Ser Gly Ala Pro Pro Val Pro Pro Thr Gly 
655 660 665 

GAC GCC GGG CCC CCC CCC GTG COG CCC AOG GGT GAC TCC GGC GCC CCC 2127 
Asp Ala Gly Pro Pro Pro Val Pro Pro Thr Gly Asp Ser Gly Ala Pro 
670 675 680 

CCC GTG COG CCC AOG GCT GAC TCC GGG GCC CCC CCC GTG ACC CCC ACG 2175 
Pro Val Pro Pro Thr Gly Asp Ser Gly Ala Pro Pro Val Thr Pro Thr 
685 690 695 

GGT GAC TCC GAG ACC GCC CCC GTG CCG CCC AOG GGT GAC TCC GGG GCC 2223 
Gly Asp Ser Glu Thr Ala Pro Val Pro Pro Thr Gly Asp Ser Gly Ala 
700 705 710 

CCC COT GTG CCC CCC AOG GGT GAC TCP GAG GCT GCC OCT CTG CCC CCC 2271 
Pro Pro Val Pro Pro Thr Gly Asp Ser Glu Ala Ala Pro Val Pro Pro 
715 720 725 730 

ACA GAT GAC TCC AAG GAA GCT CAG ATG CCT GCA GTC ATT AGG TIT 2316 
Thr Asp Asp Ser Lys Glu Ala Gin Met Pro Ala Val He Arg Phe 

735 740 745 

TAGCGTCCCA TGAGCCITGG TATCAAGAGG CCACAAGACT GGGACCCCAG GGGCTCCCCT ' 2376 
CCCMtTTGA GCICTTCCTG AATAAAGCCT CATACCCCTA AAAAAAAAAA AA 2428 
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Fig. 3. 



GAGGCCACCCCTGTGCCCCCCACAGGGGACTCC 1 

GAGGCCACTCCCGTGCCCCCCACGGGTGACTCC 2 

GAGACCGCCCCCGTGCCGCCCACGGGTGACTCC 3 

GGGGCCCCCCCCGTGCCGCCCACGGGTGACTCC 4 

GGGGCCCCCCCCGTGCCGCCCACGGGTGACTCC 5 

GGGGCCCCCCCCGi'GCCGCCCACGGGTGACTCC G 

GGGGCCCCCCCOG'JGCCGCCCACGGGTGACTCC 7 

GGGGCCCCCCCCGTGCCGCCCACGGGTGACTCC 8 

GGGGCCCCCCCCGTGCCGCCCACGGGTGACTCC 9 

GGCGCGGCCCGCGTGCCGCCCACGGG7GACGCC 1 0 

GGGCCCCCCCCCGTGCCGCCCACGGGTGACTCC 1 1 

GGCGCCCCCCCCGTGCCGCCCACGGGTGACTCC 1 2 

GGGGCCCCCCCCGTGACCCCCACGGGTGACTCC 1 3 

GAGACCGCCCCCGTGCCGCCCACGGGTGACTCC 1 4 

GGGGCCCCCCCTGTGCCCCCCACGGGTGACTCT 1 5 

; : 

GAGGCTGCCCCTGTGCCCCCCACAGATGACTCC 1 6 



Identical 
to 

consensus 



Number of substitutions 
6 



3 
0 
0 
0 
0 
0 
0 



GGGGCCCCCCCC GTGCCGCCCACGGGTGACTCC Consensus 
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Fig. U. 
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Fig. 5 (1/3) 



50 

Bssl MLTMGRLQLWLGLTCCWAVASAAKLGAVYTEGGFVEGVNKKLGLLG . DS 

Ratlpl . . .HGRLEVLFLGLTCCIiAAACAAKLGALYTEGGFVEGVNKKLSLLGGDS 

Bovceh . . . .MLGASRLGPSPGCLAVASAAKLGSVYTEGGFVEGVNKKLSLFG. DS 

Consensus ♦ . . . grl .... lgltcclaaaaAAKLGavYTEGGFVEGVNKKLsLlG . DS 



100 

Bssl VDIFKGIPFAAPTKALENPQPHPGWQGTLKAKNFKKRCLQATITQDSTYG 
Ratlpl VDIFKGIPFA . TAKTLENPQRHPGWQGTLKATQFKKRCLQATITQDDTYG 

Bovech VDIFKGIPFAAAPKALEKPKRHPGWQGTLKAKSFKKRCLQATLTQDSTYG 

Consensus VDIFKGIPFAa . . KaLEnPgrHPGWQGTLKAk, FKKRCLQATiTQDsTYG 



150 

Bssl DEDCLYLNIWVPQGRKQVSRDLPVMIWIYGGAFLMGSGBGANFLNNYLYD 
Ratlpl QEDCLYLNIWVPQGRKQVSHDLPVMVWIYGGAFLMGSGQGANFLKNYLYD 
Bovceh NEDCLYLNIWVPQGRKEVSHDLPVMIWIYGGAFLMGASQGANFLSNYLYD 

Consensus . EDCLYLNIWVPQGRKqVShDLPVMiWIYGGAFLMGsggGANFL . NYLYD 



200 

Bssl GKKIATRGNVIWTFNYRVGPLGFLSTGDANLPGNYGLRDQHMAIAWVKR 
Ratlpl GKKIATRANVIWTFNYRVGPLGFLSTGDANLPGNFGLRDQHMAIAWVKR 
Bovech GKKIATRGNVIWTFNYRVGPLGFLSTGDSNLPGNYGLWDQHMAIAWVKR 

Consensus GKKIATRgNVIWTFNYRVGPLGFLSTGDaNLPGNyGLrDQHMAIAWVKR 



# * 250 

Bssl NIAAFGGDPNNITLFGESAGGASVSLQTLSPYNKGLIRRAISQSGVALSP 
Ratlpl NIAAFGGDPNNITIFGESAGGAIVSLQTLSPYNKGLIRRAISQSGVALSP 
Bovceh NIEAFGGDPDNITLFGESAGGASVSLQTLSPYNKGLIKRGISQSGVGLCP 



Consensus NIaAFGCDPDNITLFGESAGGAsVSLQTLSPYNKGLIrRalSQSGVaLsP 



300 

Bssl WV I Q KN PLFWAKKV AKKVG C P VG D AARMAQ C LKVT D PRA LT LA Y KV P LA G 

Ratlpl WAIQENPLFWAKTIAKKVGCPTEDTAKMAGCLKITDPRALTLAYRLPLKS 
Bovech WAIQQDPLFWAKRIAKKVGCPVDDTSKMAGCAKITDPRALTLAYKLPLGS 

Consensus WalQ . nPLFWAK . iAKKVGCPv . DtakMAgClKiTDPRALTLAYkl PL . s 



350 

Bssl LEYPMLHYVGFVPVIDGDFIPADPINLYANAADIDYIAGTNNMDGHIFAS 
Ratlpl QEYPIVEYLAFIPWDGDFIPDDPINLYDNAADIDYLAGINDMDGHLFAT 
Bovceh TEYPKLHYLS FVPVIDGDFIPDDPVNLYANAADVDYI AGTNDMDGHLFVG 

Consensus . EYP . 1HY1 . FvPViDGDFlPdDPiNLYaNAADiDYiAGtNdMDGHlFa . 
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Fig. 5 (2/3) 



400 

Bssl IDMPAINKGNKKVTEEDFYKLVSEFTITKGLRGAKTTFDVYTESWAQDPS 
Ratlpl VDVPAIDKAKQDVTEEDFYRLVSGHTVAKGLKGTQATFDIYTESWAQDPS 
Bovech MDVPAINSNKQDVTEEDFYKLVSGLTVTKGLRGANATYEVYTEPWAQDSS 

Consensus . DvPAInk . kqdVTEEDFYkLVSg . TvtKGLrGa . aTf dvYTEsWAQDpS 



450 

Bssl QENKKKTWDFETDVLFLVPTEIALAQHRANAKSAKTYAYLFSHPSRMPV 
Ratlpl QENMKKTVVAFETDILFLIPTEMALAQHRAHAKSAKTYS YLFSHPSRMPI 

Bovceh QETRKKTMVDLETDILFLIPTKIAVAQHKSHAKSANTYTYLFSQPSRMPI 

Consensus QEn . KKTvVdf ETDiLFLiPTeiAlAQHrahAKSAkTY . YLFShPSRMPi 



500 

Bssl YPKWVGADHADDIQYVFGKPFATPTGYRPQDRTVSKAMIAYWTNFAKTGD 
Ratlpl YPKWMGADHADDLQYVFGKPFATPLGYRAQDRTVSKAMIAYWTNFAKSGD 
Bovech YPKWMGADHADDLQYVFGKPFATPLGYRAQDRTVSKAMIAYWTNFARTGD 

Consensus YPKWmGADHADDlQYVFGKPFATPlGYRaQDRTVSKAMIAYWTNFAktGD 



550 

Bssl PNMGDSAVPTHWEPYTTENSGYLEITKKHGSSSMKRSLRTNFLRYWTLTY 
Ratlpl PNMGNSPVPTHWYPYTMENGNYLDINKKITSTSMKEHLREKFLKFWAVTF 
Bovceh PNTGHSTVPANWDPYTLEDDNYLEINKQMDSNSMKLRTLTNYLQFWTQTY 

Consensus PNnG. S . VPthi. . PYT. Kn. nYlelnKkm. S . SMK. hLRtnf L. fWt.Ty 



Bssl 

Ratlpl 

Bovech 



LALPTVTDQ 
EMLPTV. . . 
QALPTVTSA 



EATPVPPTGDS 
VGDHTPPKDDS 
GASLLPPEDNS 



EATPVPPTGDS 
EAAPVPPTDDS 
QASPVPPADNS 



ETAPVPPTGDS 
DGGPVPPTDDS 
GAPTEPSAGDS 



596 
GAPP 
QTTP 



Consensus 



, aLPTVt. 



, PP. ddS . eA . PVPPtddS . 



. pvPptgDS . 



Bssl 

Ratlpl 

Bovceh 



VPPTGDS 
VPPTDNS 



GAPPVPPTGDS 
QA 



GAPPVPPTGDS 



GAPPVPPTGDS 



642 
GAPPVP 



Consensus 
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Bssl 

Rarlpl 

Bovech 



PTGDS 



GAPPVPPTGDS 



GAPPVPPTGDS 



GPPPVTPTGDS 



689 

GAPPVPPTG 



Consensus 



735 



Bssl 

Ratlpl 

Bovceh 



DS 


GAPPVTPTGDS 


ETAPVPPTGDS 


GAPPVPPTGD 


SEAAPVPPTDDS 























Consensus 



. ds 



Bssl 

Ratlpl 

Bovech 



KE . AQMPAVIRF 
VE.AQMPGPIGF 
. EVAQMPWIGF 



Consensus .e.AQMP.vIgF 
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(95-113) 
BSSL 

(134-150) 
ChesHum 
(132-150) 
Torpace 
(118-136) 
Drosceh 
(121-138) 
Ratlivce 
(273-291) 
Drosace 
(2270-2291) 
ThryHum 

Diet. Di 

Consensus 
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